{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Using Batch API to call GLM models\n",
    "\n",
    "**This tutorial is available in English and is attached below the Chinese explanation**\n",
    "\n",
    "本章节将带领开发者使用 Batch API 来调用GLM系列模型。使用 Batch API将能以更低的价格调用模型，但是，需要等待一天的时间（batch API 是 非事实服务的）。\n",
    "我们将按照[官方文档](https://open.bigmodel.cn/dev/howuse/batchapi)提供的方式进行调用。\n",
    "\n",
    "This cookbook aims to help developers use the Batch API to call the GLM series of models. Using the Batch API, you can call the model at the price of the queue, but you need to wait for a day (the batch API is not a de facto service).\n",
    "We will call it according to the method provided by [official documentation](https://open.bigmodel.cn/dev/howuse/batchapi)."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "55157c82075cb745"
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 1. Set up the API key\n",
    "\n",
    "首先，设定好调用模型的API key\n",
    "First, set the API key for calling the model"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "292831532b8249fc"
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "initial_id",
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2024-05-25T06:39:15.233121Z",
     "start_time": "2024-05-25T06:39:14.988702Z"
    }
   },
   "outputs": [],
   "source": [
    "from zhipuai import ZhipuAI\n",
    "\n",
    "client = ZhipuAI(api_key=\"your zhipu ai api key\")"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 2. Upload data set\n",
    "\n",
    "接着，我们需要提前按照格式准备好的数据集，并上传到智谱的服务器上。这里我们使用一个情感分类的数据集，您可以在同一个Json中包含多个文本并有多个不同的配置文件细节，例如，您可以指定某一条数据使用哪个模型，设置模型的参数和输出长度等。在本demo中，我已经为大家提供了一个数据集。这里的数据集均使用了`GLM-4`模型。\n",
    "\n",
    "Next, we need to prepare the data set in advance according to the format and upload it to zhipuai's server. Here we use an emotion classification data set. You can include multiple texts in the same Json and have multiple different configuration file details. For example, you can specify which model to use for a certain piece of data, and set the parameters and output length of the model. In this demo, I have provided a data set for you. The data set here all use the `GLM-4` model."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "57114b58720de7df"
  },
  {
   "cell_type": "code",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1716468180_59c7ca593a0b4384bfda9d85cc64181d\n"
     ]
    }
   ],
   "source": [
    "result = client.files.create(\n",
    "    file=open(\"data/batch_input.jsonl\", \"rb\"),\n",
    "    purpose=\"batch\"\n",
    ")\n",
    "print(result.id)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-05-23T12:43:00.408665Z",
     "start_time": "2024-05-23T12:43:00.200497Z"
    }
   },
   "id": "c7e8defdea355c58",
   "execution_count": 6
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 3. Create a batch task\n",
    "\n",
    "接着，我们需哟根据我们的上传任务，创建一个batch任务。在这个任务中，我们需要指定我们的输入文件，输出文件，以及我们的模型的endpoint。我们的endpoint是`/v4/chat/completions`。\n",
    "\n",
    "\n",
    "Next, we need to create a batch task based on our upload task. In this task, we need to specify our input files, output files, and the endpoint of our model. Our endpoint is `/v4/chat/completions`."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "c84acc3707b9d4a2"
  },
  {
   "cell_type": "code",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Batch(id='batch_1793623747699552256', completion_window='24h', created_at=1716468200026, endpoint='/v4/chat/completions', input_file_id='1716468180_59c7ca593a0b4384bfda9d85cc64181d', object='batch', status='validating', cancelled_at=None, cancelling_at=None, completed_at=None, error_file_id=None, errors=None, expired_at=None, expires_at=None, failed_at=None, finalizing_at=None, in_progress_at=None, metadata={'description': 'Sentiment classification'}, output_file_id=None, request_counts=BatchRequestCounts(completed=None, failed=None, total=None))\n"
     ]
    }
   ],
   "source": [
    "create = client.batches.create(\n",
    "    input_file_id=\"1716468180_59c7ca593a0b4384bfda9d85cc64181d\",\n",
    "    endpoint=\"/v4/chat/completions\",\n",
    "    completion_window=\"24h\",\n",
    "    metadata={\n",
    "        \"description\": \"Sentiment classification\"\n",
    "    }\n",
    ")\n",
    "print(create)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-05-23T12:43:19.957434Z",
     "start_time": "2024-05-23T12:43:19.817403Z"
    }
   },
   "id": "1158db7765bfbbaa",
   "execution_count": 8
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 4. Get task status\n",
    "我们的任务已经创建，接下来我们需要获取任务的状态。我们可以通过任务的id来获取任务的状态。（在这个查询结果中，我等待了24小时才进行查询，确保任务已经完成并返回了结果)\n",
    "\n",
    "Our task has been created, now we need to get the status of the task. We can get the status of the task through the task id. (In this query result, I waited 24 hours before querying to ensure that the task was completed and the results were returned)"
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "dddbbf32d61a55ca"
  },
  {
   "cell_type": "code",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Batch(id='batch_1793623747699552256', completion_window='24h', created_at=1716468200000, endpoint='/v4/chat/completions', input_file_id='1716468180_59c7ca593a0b4384bfda9d85cc64181d', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1716473500000, error_file_id='', errors=None, expired_at=None, expires_at=None, failed_at=None, finalizing_at=1716473460000, in_progress_at=1716468310000, metadata={'description': 'Sentiment classification'}, output_file_id='1716473500_874dd086fcc34f4fb56af30926ff959a', request_counts=BatchRequestCounts(completed=3, failed=0, total=3))\n"
     ]
    }
   ],
   "source": [
    "batch_job = client.batches.retrieve(batch_id=\"batch_1793623747699552256\")\n",
    "print(batch_job)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-05-25T06:39:44.042183Z",
     "start_time": "2024-05-25T06:39:43.744855Z"
    }
   },
   "id": "11eb6fc9b962cc4",
   "execution_count": 4
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 5. Get task status\n",
    "在等待一天的时间后，我们可以通过任务的id来获取任务的状态，并将运行结果下载到本地。运行结束后，我们通过查看任务的状态，可以获取`output_file_id`。我们就可以获取解答的文件并下载。\n",
    "\n",
    "After waiting for a day, we can get the status of the task through the task id and download the running results to the local. After running, we can get `output_file_id` by checking the status of the task. We can get the answer file and download it."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "e3244b521b25c6d7"
  },
  {
   "cell_type": "code",
   "outputs": [],
   "source": [
    "content = client.files.content(file_id=\"1716473500_874dd086fcc34f4fb56af30926ff959a\")\n",
    "content.write_to_file(file=\"data/batch_output.jsonl\")"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-05-25T06:40:34.889027Z",
     "start_time": "2024-05-25T06:40:34.652862Z"
    }
   },
   "id": "1526e697a8a44072",
   "execution_count": 6
  },
  {
   "cell_type": "markdown",
   "source": [
    "保存的文件`batch_output.jsonl` 已经同步上传到本案例中的`data`文件夹中。\n",
    "\n",
    "The saved file `batch_output.jsonl` has been synchronously uploaded to the `data` folder in this case."
   ],
   "metadata": {
    "collapsed": false
   },
   "id": "dcbd9a7d6e6627b9"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
